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(54) Title: LOW FALSE ALARM RATE VIDEO SECURITY SYSTEM USING OBJECT CLASSIFICATION 



(57) Abstract 

A video detection system (10) and method 
detects an intruder from video images of a scene. 
The method employs a recognition process to 
differentiate between humans and animals. The 
method is used only after possible false alarms 
resulting from identified effects of noise, alias- 
ing, non-intruder motion occuring within the 
scene, and effects of global or local lighting 
changes. The object recognition process includes 
determining the regions containing a potential in- 
truder, outlining and growing those regions to 
encompass all of potential intruders, determining 
a set of shape features from the region and elim- 
inating possible shadow effects, normalizing the 
set, and comparing the normalized set with sets 
of features of humans and animals. This com- 
parison produces a confidence level indicating a 
human intruder. An alarm is given for a suffi- 
ciently high confidence level. The possibility of a 
false alarm due to an animal or a non-identifiable 
object is also substantially eliminated. 
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LOW FALSE ALARM RATE VIDEO SECURITY SYSTEM USING OBJECT 

CLASSIFICATION 

Technical Field 

5 This invention relates to video security systems and a method for 

detecting the presence of an intruder into an area being monitored by the 
system; and more particularly, to i) the rejection of false alarms which might 
otherwise occur because of global or local, natural or manmade, lighting 
changes which occur within a scene observed by the system, ii) the discernment 

10 of an intruder based upon sensed surface differences which occur within the 
scene rather than lighting changes which may occur therewithin, and iii) the 
classification of an intruder detected by the system as either human or non- 
human, and to provide an alarm if the intruder is classified as a human. 
Background Art 

15 A security system of the invention uses a video camera as the principal 

sensor and processes a resulting image to determine the presence or non- 
presence of an intruder. The fundamental process is to establish a reference 
scene known, or assumed, to have no intruder(s) present. An image of the 
present scene, as provided by the video camera, is compared with an image of 

20 > the reference scene and any differences between the two scenes are ascertained. 
If the contents of the two scenes are markedly different, the interpretation is 
that an intrusion of some kind has occurred within the scene. Once the 
possibility of an intrusion is evident, the system and method operate to first 
eliminate possible sources of false alarms, and to then classify any remaining 

25 differences as being the result of a human or non-human intrusion. Only if a 
determination is made that the anomaly results from a human intrusion is a 
notification (alarm) made. All other anomalies which produce a difference 
between the two images are identified as false alarms for which no notification 
is given. 

30 One issue addressed in making the determination is the possibility of 

false alarms caused by lighting changes within a scene, whether natural or 
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unwanted detections due to non animal/non-human caused motion are 
eliminated, it is still necessary to differentiate between a class of human motion 
and a class of non-human or animal motion. Only by doing so can intrusions 
resulting from human actions properly cause an alarm to be given and false 
alarms resulting from animal movements not be provided. 

Previous attempts have made to provide a reliable security system. 
These systems have relied upon contact break mechanisms or PID (passive infra 
red) motion sensors to detect intruder presence. Examples of the use of infrared 
devices, either as a passive element or as a scanning device, are disclosed in U. 
S. patents 5,283,551, 5,101,194, 4,967,183, 4,952,911, 4,949,074, 4,939,359, 
4,903,990, 4,847,485, 4,364,030, and 4,342,987. More recently, however, the 
realization that an image processor is required to transmit the video for 
confirmation purposes has led to the development of using the image processor 
to actually detect the possible presence of an intruder. Such a system has an 
economy of hardware and obviates the need for PID sensors or contact breaker 
devices. A security system of this type has comparable performance to a PID 
counterpart. However, there are areas where considerable benefits accrue if 
false alarms, which occur due to the erroneous injection of light into the scene 
without the presence of an intruder, are reduced or eliminated. 

The cause of these false alarms stem from the sensor and methodology 
used to ascertain if an intrusion has occurred. As stated earlier, a past image of 
the scene being surveyed is compared with the present scene as taken from the 
camera. The form of comparison is essentially a subtraction of the two scenes 
on a pixel by pixel basis. Each pixel represents a gray level measure of the 
scene intensity that is reflected from that part of the scene. Gray level intensity 
can change for a variety of reasons, the most important being a new physical 
presence within a particular part of the scene. Additionally, the intensity will 
change at that location if the overall lighting of the total scene changes (a global 
change), or the lighting at this particular part of the scene changes (a local 
change), or the AGC (automatic gain control) of the camera changes, or the 
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detect the presence of an object. Once an anomaly is detected because of 
differences in the comparison of an original and a later image, the system 
automatically dials and sends a difference image, provided the differences are 
large enough, to a . remote site over a telephone line. At the remote site, the 
5 image is viewed by a human. While teaching some aspects of detection, Yausa 
et al. does not go beyond the detection process to attempt and use image 
processing to recognize that the anomaly is caused by a human presence. 

U.S. patent 4,257,063, which is directed to a video monitoring system 
and method, teaches that a video line from a camera can be compared to the 

10 same video line viewed at an earlier time to detect the presence of a human. 
However, the detection device is not a whole image device, nor does it make 
any compensation for light changes, nor does it teach attempting to 
automatically recognize the contents of an image as being derived from a 
human. Similarly, U.S. patent 4,161,750 teaches that changes in the average 

15 value of a video line can be used to detect the presence of an anomalous object. 
Whereas the implementation is different from the '063 patent, the teaching is 
basically the same. 

All of these previous attempts at recognition have certain drawbacks, 
whether the type of imaging, method of processing, etc., which would result in 

20 either -an alarm not being provided when one should, or in false alarms being 
given. The system and method of the present invention overcome these 
problems or shortcomings to reliably provide accurate indications of human 
intrusion in an area being monitored by a security system. Such an approach is 
particularly cost efficient because it reduces the necessity of guards having to 

25 patrol secured areas (which means each area will be observed only on an 
infrequent basis unless there are a large number of guards), while ensuring that 
any intrusion in any area is not only observed, but an appropriate alarm is 
sounded in the event of a human intrusion. 
Disclosure of the Invention 

30 Among the several objects of the present invention may be noted the 
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prcv ision of a video secQrity system ^ method vjsuai(y ^ ^ 

and detecting the presence of an intruder within the scene; 

the provision of such a system and method whose operation is based 
upon the premise tha, on ly the presence of a human intruder is of conscience 
> <o the security system, with everything else constituting a faise alarm- 

me provision of such a system and method to readily distinguish between 
changes within „. scene raused by ^ ^ ^ ^ ^ ^ ^ 

opposed to changes within the scene resulting fiom lighting changes (whether 
global or iocal, natural or man made) and other anomalies which occur within the 
scene to detect the presence of an intruder; 

•he provision of such a system and method ,„ employ a recognition 

process rather than an abnormality process such as used in other systems ,o 

dtfferentiate between human and non-human objects, so to reduce or 

substantially eliminate false alarms; 

the provision of such a system and method to provide a high probability 

of detection of the presence of a human, while having a low probability offa.se 

alarms; 

.he provision of such a system and method which provides image 
processing such tha, false alarms resulting from the inadvertent presence of 

artifacts as caused by noise alia*;™ ™„ • ♦ j 

y noise, aliasing, non-intruder motion occurring within the 

scene, are identified and do not provoke a system response; 

the provision of such a system and method which, once an intruder has 
been detected, further classifies the intrusion as resulting from the presence of a 
human life form, or the presence of non-human life forms, such as shadows 
dogs, cats, rats, mice, birds, etc. 

the provision of such a system and method in which an indication of an 
■ntrusion is given only after the cause of the intrusion has been determined as 
resulting from the presence of a human so ,„ avoid giving false alarms- 

.he provision of such a system and method to a,so provide a second and 
lower level alarm in the event an object cannot be classified as human or non- 
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human so an operator/verifier is informed of the possible presence of an intruder 
in the scene; 

the provision of such a system and method to evaluate a series of images 
of the scene and determine, for each image examined, the classification of an 
5 object so to have an increased confidence level that an object classified as a 
human is properly classified; 

the provision of such a system and method in which the alarm indication 
which is provided includes automatically accessing a site remote from the scene 
where the intrusion occurs and transmitting an image of the scene in which the 
1 0 intruder is present to the remote site; 

the provision of such a system and method in which the transmitted 
image is a compressed image of the scene rather than a small subset of the 
image; and, 

the provision of such a system and method by which a number of areas 
15 can be continuously, reliably, and cost effectively monitored with a human 
intrusion in any area being reliably detected and the appropriate alarm given. 

In accordance with the invention, generally stated, a video detection 
system detects the presence of an intruder in a scene from video provided by a 
camera observing the scene. A recognition process differentiates between 
20 human and non-human (animal) life forms. The presence of a human is 
determined with a high degree of confidence so there is a very low probability 
of false alarms. Possible false alarms resulting from the effects noise, aliasing, 
non-intruder motion occurring within the scene, and the effects of global or local 
lighting are first identified and only then is object recognition performed. 
25 Performing object recognition includes determining which regions within the 
image may be an intruder, outlining and growing those regions so the result 
encompasses all of what may be the intruder, determining a set of shape features 
from the region and eliminating possible shadow effects, normalizing the set of 
features and comparing the resulting set with sets of features for humans and non- 
30 human (animal) life forms. The result of the comparison produces a confidence 
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level as to whether or not the intruder is a human. If the confidence level is 
sufficiently high, an alarm is given. By performing object classification in this 
manner, the further possibility a false alarm may occur due to the presence of an 
animal, or a non-identifiable object in the scene is also substantiality eliminated 
Other objects and features will be in part apparent and in part pointed out 
hereinafter. 

Brief Desc ription of Drawing s 

In the drawings, Fig. 1 is a simplified block diagram of a video security 
system of the present invention for viewing a scene and determining the 
presence of an intruder in the scene; 

Fig. 2 is a representation of an actual scene viewed by a camera of the 

system; 

Fig. 3 is the same scene as Fig. 2 but with the presence of an intruder; 
Fig. 4 is a representation of another actual scene under one lighting 
15 condition; 

Fig. 5 is a representation of the same scene under different lighting 
conditions and with no intruder in the scene; 

Fig. 6A is a representation of the object in Fig. 3 including its shadow 
Fig. 6B illustrates outlining and segmentation of the object; and Fig 6C 
illustrates the object with its shadow removed and as resampled for determining 
a set of features for the object; 

Figs. 7A-7C represent non-human (animal) life forms with which 
features of the object are compared to determine if the object represents a 
human or non-human life form and wherein Fig. 7 A represents a cat, Fig. 7B a 
25 dog, and Fig. 7C a bird; 

Figure 8 is a simplified time line indicating intervals at which images of 
the scene are viewed by the camera system; 

Figure 9 represents a pixel array such as forms a portion of an image; 

Fig. 10 illustrates masking of an image for those areas within a scene 



20 



and, 



30 
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where fixed objects having an associated movement or lighting change are 
located. 

Corresponding reference characters indicate corresponding parts 
throughout the drawings. 
5 Best Mode for Carrying Out the Invention 

Referring to the drawings, a video security system of the invention is 
indicated generally 1 0 in Fig. 1 . The system employs one or more cameras C 1 - 
Cn each of which continually views a respective scene and produces a signal 
representative of the scene. The cameras may operate in the visual or infrared 

10 portions of the light spectrum and a video output signal of each camera is supplied 
to a processor means 12. Means 12 processes each received signal from a camera 
to produce an image represented by the signal and compares the image 
representing the scene at one point in time with a similar image of the scene at a 
previous point in time. The signal from the imaging means represented by the 

15 cameras may be either an analog or digital signal, and processing means 12 may 
be an analog, digital, or hybrid processor. 

In Fig. 2, an image of a scene is shown, the representation being the actual 
image produced by a camera C. Fig. 2 represents, for example, a reference image 
of the scene. Fig. 3 is an image exactly the same as that in Fig. 2 except that now 

20 a person (human intruder) has been introduced into the scene. Fig. 3 is again an 
actual image produced by a camera C. Similarly, Fig. 4 represents a reference 
image of a scene, and Fig. 5 a later image in which there is a lighting change but 
not an intrusion. The system and method of the invention operate to identify the 
presence of such a human intruder and provide an appropriate alarm. However, it 

25 is also a principal feature of the invention to not produce false alarms. As 
described herein and in the referenced co-pending application, there a numerous 
sources of false alarms and using a series of algorithms employed by the 
invention, these sources are identified for what they are so no false alarms are 
given. 

30 Operation of the invention is such that segments of an image (Fig. 3, Fig. 
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5) which differ from segments of an earlier image (Fig. 2, Fig. 4) are identified A 
chscnrmnator means 14 evaluates those segments to determine if the differences 
are caused by a local lighting change within the scene (Fig. 5), or the movement 
of an intruder within the scene (Fig. 3). As noted, if the change is caused by an 
intruder, an alarm is given. But, if the differences result from global or local 
lighting changes, the effects of motion of objects established within the scene 
noise, and aliasing effects, these are recognized as such so false alarm is not given' 
Detection of local lighting changes such as shown in Fig. 5 are described in the 
referenced co-pending application. 

Generally, a single processor can handle several cameras positioned at 
Afferent locations within a protected site. In use, the processor cycles through 
the different cameras, visiting each at a predetermined interval. At system 
power-up, the processor cycles through all of the cameras doing a self-test on 
each. One important test at this time is to record a reference frame against 
wluch later frames will be compared. A histogram of pixel values is formed 
from this reference frame. If the histogram is too narrow, a message is sent to 
the effect that this camera is obscured and will not used. This is done to guard 
against the possibility of someone obscuring the camera while it is off by 
Physically blocking the lens with an object or by spray-painting it. If a camera 
is so obscured, then all the pixel values will be very nearly the same and this 
w,ll show up in the histogram. Although the camera is now prevented from 
palpating in the security system, the system operator is informed that 
something is amiss at that particular location so the problem can be investigated 

In accordance with the method, a reference frame fl is created 
Throughout the monitoring operation, this reference frame is continuously 
updated ,f there is no perceived motion within the latest image against which a 
reference image is compared. At each subsequent visit to the camera a new 
frame Q is produced and subtracted from the reference. If the difference is not 
significant, the system goes on to the next camera. However, if there is a 
difference, frame £2 is stored and a third frame D is created on the next visit and 
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compared to both frames fl and £2. Only if there is a significant difference 
between frames f3 and f2 and also frames f3 and fl, is further processing done. 
This three frame procedure eliminates false alarms resulting from sudden, 
global light changes such as caused by lightning flashes or interior lights going 
5 on or off. A lightning flash occurring during frame f2 will be gone by frame f3, 
so there will be no significant difference between frame G and fl . On the other 
hand, if the interior lights have simply gone on or off between frames fl and f2, 
there will be no significant changes between frames f2 and f3. In either 
instance, the system proceeds on to the next camera with no more processing. 

10 Significant differences between frames fl and f2, frames f3 and f2, and frames 
D and fl indicate a possible intrusion requiring more processing. 

Besides global lighting changes occurring between the images, non- 
intruder motion occurring within the scene is also identified so as not to trigger 
processing or cause false alarms. Thus, for example, if the fan shown in the lower 

1 5 left portion of Figs. 4 and 5 'were running, movement of the fan blades would also 
appear as a change from one image to another. Similarly, if the fan is an 
oscillating fan, its sweeping movement would also be detected as a difference 
from one image to another. As described hereinafter, and as shown in Fig. 10, the 
area within the scene where an object having an associated movement is generally 

20 fixed and its movement is spatially constrained movement, the area where this 
movement occurs is identified and masked so, in most instances, motion effects 
resulting from operation of the object (fan) are disregarded. Although, if the 
motion of an intruder overlaps the masked area, the difference from one image to 
another is identified and further processing, including the normally* masked area 

25 takes place. It will be understood that there are a variety of such sources of 
apparent motion which are identified and masked. Besides the fan, there are 
clocks both digital and those having hands. In one instance, the numerical display 
of time changes; in the other instance, the hands of the clock (particularly the 
second hand) has a noticeable movement. Computers with screen savers may 

30 have a constantly changing image on their monitors. In manufacturing areas, 
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different pieces of equipment, rotating or reciprocal machinery, robotic arms etc 
all exhibit movements which can be identified and accounted for during 

processing. 

Any video alert system which uses frame-to-frame changes in the video 
to detect intrusions into a secured area is also vulnerable to false alarms from 
the inadvertent (passing automobile lights, etc.) or deliberate (police or security 
guard flashlights) introduction of light into the area, even though no one has 
Physically entered the area. The system and method of the invention 
chfferentiate between a change in a video frame due to a change in the 
Nation of the surfaces in the FOV (field of view) as in Fig. 5, and a change 
due to the introduction of a new reflecting surface in the FOV as in Fig 3 The 
former is then rejected as a light "intrusion" requiring no alarm, whereas the 
latter ,s identified as a human intruder for which an alarm is given It is 
.mportant to remember that only the presence of a human intruder is of 
consequence to the security system, everything else constitutes a false alarm It 
is the capability of the system and method of the invention to yield a high 
probability of detection of the presence of a human, while having a low 
probability of false alarms which constitutes a technically differentiated video 
security system. The video processing means of the present invention can also 
defeat the artifacts of noise, aliasing, screen savers, oscillating fans, drapery 
blown by air flow through vents, etc. 
ALGORITHM PROCESS STEPS 

The complete algorithm processes that are implemented by the method 
of the present invention are as follows: 
25 Antialiasing; 

Detection (Differencing and Thresholding) 
Outlining; 

Region Grower Segmentation; 
Noise removal; 
30 Shadow removal; 



15 



20 
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Tests for global and local lighting changes; 
Masking; 
Shape features; 
Fourier Descriptors; 
5 Object classification 

ANTIALIASING PROCESS 

The alias process is caused by sampling at or near the intrinsic resolution 
of the system. As the system is sampled at or near the Nyquist frequency, the 
video, on a frame by frame basis, appears to scintillate, and certain areas will 
10 produce Moire like effects. Subtraction on a frame by frame basis would cause 
multiple detections on scenes that are unchanging. In many applications where 
this occurs it is not economically possible to over sample. Elimination of 
aliasing effects is accomplished by convolving the image with an equivalent 
two-dimensional (2D) smoothing filter. Whether this isa3x3or5x5 filter, or 
15 a higher filter, is a matter of preference as are the weights of the filter. 
DETECTION PROCESS 

The detection process consists of comparing the current image to a 
reference image. To initialize the system it is assumed that the operator has 
control over the scene and, therefore, will select a single frame for the reference 
20 when there is nothing present. (If necessary, up to 60 successive frames can be 
selected and integrated together to obtain an averaged reference image). As 
shown in Fig. 1, apparatus 10 employs multiple cameras Cl-Cn, but the 
methodology with respect to one camera is applicable for all cameras. For each 
camera, an image is periodically selected and the absolute difference between 
25 the current image (suitably convolved with the antialiasing filter) and the 
reference is determined. The difference image is then thresholded (an intensity 
threshold) and all of the pixels exceeding the threshold are accumulated. This 
step eliminates a significant number of pixels that otherwise would result in a 
non-zero result simply by differencing the two images. Making this threshold 
30 value adaptive within a given range of threshold values ensures consistent 
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performance. If the count of the pixels exceeding the intensity threshold 
exceeds a pixel count threshold, then a potential detection has occurred. At this 
time, all connected hit pixels (pixels that exceed the intensity threshold) are 
segmented, and a count of each segmented object is taken. If the pixel count of 
any object exceeds another pixel count threshold, then a detection is declared 
Accordingly, detection is defined as the total number of hit pixels in the 
absolute difference image being large and there is a large connected object in 
the absolute difference image. 

With respect to noise, the key to rejecting noise induced artifacts is their 
size. Noise induced detections are generally spatially small and distributed 
randomly throughout the image. The basis for removing these events is to 
ascertain the size (area) of connected pixels that exceed the threshold set for 
detection. To achieve this, the region where the detected pixels occur is grown 
into connected "blobs". This is done by region growing the blobs. After region 
growing, those blobs that are smaller in size than a given size threshold are 
removed as false alarms. 
REGION GROWER SEGMENTATION 

Typically, a region growing algorithm starts with a search for the first 
object pixel as the outlining algorithm does. Since searching and outlining has 
already been performed, and since the outline pixels are part of the segmented 
object, these do not need to be region grown again. Outline pixel arrays are 
now placed on a stack, and the outline pixels are zeroed out in the absolute 
difference image. A pixel is then selected (removed from the stack) and the 
outline pixels are zeroed out in the absolute difference image. The selected 
pixel P and all of its eight neighbors P1-P8 (see Fig. 9) are examined to see if 
hit points occur (i.e. they are non- zero). If a neighbor pixel is non-zero, then it 
is added to the stack and zeroed out in the absolute difference image. Note that 
for region growing, all eight neighboring pixels are examined, whereas in 
outlining, the examination of neighboring pixels stops as soon as an edge pixel 
is found. Thus, in outlining, as few as one neighbor may be investigated. The 
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region growing segmentation process stops once the stack is empty. 

One way to achieve the desired discrimination is to use an elaboration of 
the retinex theory introduced by Edwin Land some 25 years ago. Land's theory 
was introduced to explain why human observers are readily able to identify 
5 differences in surface lightness despite greatly varying illumination across a 
scene. Although the following discussion is with regards to a human observer, 
it will be understood that besides human vision, Land's theory is also applicable 
to viewing systems which function in place of a human viewer. According to 
the theory, even if the amount of energy reflected (incident energy times surface 

10 reflectance) from two different surfaces is the same, an observer can detect 
differences in the two surface lightness' if such a difference exists. In other 
words, the human visual system has a remarkable ability to see surface 
differences and ignore lighting differences. Land's hypothesis was that this 
ability derives from comparison of received energies across boundaries in the 

15 scene. Right at any boundary, light gradients make no difference because the 
energies received from adjacent regions on opposite sides of a boundary are in 
the correct ratio (the same as the ratio of reflectances). Furthermore, correct 
judgments about lightness' of widely separated regions are made by a serial 
process of comparisons across intervening regions. At first the theory was 

20 applied only to black and white scenes. Subsequently, it was extended to color 
vision by assuming that three separate retinex systems judge the lightness of 
surfaces in the three primary colors (red, green and blue). The retinex theory of 
color vision is able to explain why surface colors appear very stable to humans 
even though the nature of the illumination may change through a wide range. 

25 It is the ability to discern surface differences and ignore lighting changes 

which is incorporated into the video security system and method of the present 
invention. Therefore, whether or not Land's theory correctly explains the way 
human vision operates, use of his concepts in the present invention make the 
system and method immune to light "intrusions". 

30 A video signal (gray level) for any pixel is given by 



WO 98/28706 



PC1YUS97/24163 



- 16- 



10 



20 



.25 



g*\E(X)r(X)S(X)dX (i) 

where E(X) - sce ne spectral irradiance at the pixel in question 

r(X) = SC ene spectral reflectance at the pixel in question 

sensor spectral response 
The constant of proportionality in (1) depends on geometry and camera 
characteristics, but is basically the same for all pixels in the frame. 
The ratio of video signals for two adjacent pixels is: 

gz /E 2 (X)r 2 (X) S (X) dX \E(X)r 2 (X)S(X)JX 



(2) 

where we have used Land's approximation that the scene irradiance does not 
vary significantly between adjacent pixels: Ej(X)=E 2 (X) S E(X). Assuming that 
the spectral reflectances are nearly constant over the spectral response of the 
camera, then r K (X) =r K = 1, 2 and 
1 5 2.1 *ri /E(X)S(X\ dk rj_ 

g 2 r 2 /E(X) S (X) dk r 2 

(3) 



In other words, for the conditions specified, ratios of adjacent pixel 
values satisfy the requirement of being determined by scene reflectances only 
and are independent of scene illumination. It remains to consider the 
practicality of the approximations used to arrive at (3). A basic assumption in 
the retinex process is that of only gradual spatial variations in the scene 
irradiance; that is, we must have nearly the same irradiance of adjacent pixel 
areas in the scene. This assumption is generally true for diffuse lighting, but for 
directional sources it may not be. For example, the intrusion of a light beam 
into the area being viewed can introduce rather sharp shadows, or change the 
amount of light striking a vertical surface without similarly changing the 
amount of light striking an adjacent tilted surface. In these instances, ratios 
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between pixels straddling the shadow line in the first instance, or the surfaces in 
the second instance, will change even though no object has been introduced into 
the scene. However, even in these cases, with 512 by 484 resolution, the pixel- 
to-pixel change is often less than it appears to the eye, and the changes only 
5 appear at the boundaries, not within the interiors of the shadows or surfaces. By 
establishing a threshold on hits, the system can tolerate a number of these hits 
without triggering an intrusion alarm. 

Another method, based on edge mapping, is also possible. As in the 
previous situation, the edge mapping process would be employed after an initial 

1 0 detection stage is triggered by pixel value changes from one frame to the next. 
Within each detected "blob" area, an edge map is made for both the initial 
k (unchanged) frame and the changed frame that triggered the alert. Such an edge 
map can be constructed by running an edge enhancement filter (such as a Sobel 
filter) and then thresholding. If the intrusion is just a light change, then the 

15 edges within the blob should be basically in the same place in both frames. 
However, if the intrusion is an object, then some edges from the initial frame 
will be obscured in the changed frame and some new ed^es, internal to the 
intruding object, will be introduced. 

Extensive laboratory testing revealed problems with both methods. In 

20 particular, it is difficult to set effective thresholds with the retinex method, 
because with a background and intrusive object both containing large uniform 
areas, many adjacent pixel ratios of unity in both the reference frame and the 
new frame are obtained. Therefore the fraction of ratios that are changed is 
diluted by those which contribute no information one way or the other. On the 

25 other hand, the edge mapping method shows undue dependence on light 
changes because typical edge masks use absolute differences in pixel values. 
Light changes can cause new edges to appear, or old ones to disappear, in a 
binary edge map even through there is no intervening object. By exploiting 
concepts from both methods, and key to this invention, an algorithm having 

30 both good detection and false alarm performance characteristics has been 
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constructed. Additional system features also help eliminate light changes of 
certain types which are expected to occur, so to further enhance performance. 

The basic premise of the variable light rejection algorithm used in the 
method of the invention is to compare ratios of adjacent pixels from a 
segmented area in frame fl with ratios from corresponding pixels in frame fi, 
but to restrict the ratios to those across significant edges. Restricting the 
processing to ratios of pixels tends to reject illumination changes, and using 
only edge pixels eliminates the dilution of information caused by large uniform 
areas. 

In implementing the algorithm, 

a) Ratios R of adjacent pixels (both horizontally and vertically) in frame fl 
are tested to determine if they significantly differ from unity: R-l >T,? or 
(1/R)-1 >T,?, where T, is a predetermined threshold value. Every time such a 
significant edge pair is found an edge count value is incremented. 

b) Those pixel pairs that pass either of the tests in a) have their 
corresponding ratios R' for frame f3 calculated. 

c) A check is made to see if R' differs significantly from the corresponding 
ratio R: 

|R'-R|/R >T 2 ?, where T 2 is a second predetermined threshold value. Each time 
this test is passed a hit count value is incremented. 

d) A test is made for new edges in frame fi (i.e., edges not in frame fl ): R'- 
1 >T,? or (1/R')-1 >T,? Every time such a new significant edge pair is found 
the edge count value is incremented again. 

e) Those pixel pairs that pass either of the tests in d) have their 
corresponding ratios from frame fl , R, calculated. 

f) A check is made to see if ratio R' differs significantly from the 
corresponding ratio R: |R'-R|/R >T 2 ? Each time this test is passed the hit count 
value is incremented again. 

g) The segmented area is now deemed an intrusion if the ratio of changed 
edges to the edge count value (ecv) is sufficiently large: that is, there is an 
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intrusion if H/ecv >T 3 , where T 3 is a third predetermined threshold value. 
SHADOW REMOVAL 

While the object is being outlined and segmented, the x and y 
coordinates of the pixels outlined and segmented are accumulated. This 
5 information is now used to calculate the centroid Z (see Fig. 6B) of the object. 
Also, the minimum and maximum x and y pixel coordinates of the object are 
computed at this time (see Fig. 6B). Both the centroid of the object and the 
object's minimum and maximum x, y coordinate values are used in a process to 
remove a shadow S (see Fig. 6A) from the object. Using the coordinate values, 
10 and assuming that life forms exhibit compact mass shapes, pieces of the object 
which stick out can be identified as a shadow and can be curtailed during 
subsequent processing. For drawing simplification, object O is shown in Fig. 
4B with its shadow S removed. 
SHAPE FEATURES 

15 Having outlined and region grown an object to be recognized, a series of 

linear shape features and Fourier descriptors are extracted for each segmented 
region. Values for shape features are numerically derived from the image of the 
object based upon the x, y pixel coordinates obtained during outlining and 
segmentation of the object. These features include, for example, values 

20 representing the height of the object (y max . - y min .), its width (x max . - x min .), 
horizontal and vertical edge counts, and degree of circularity. However, it will 
be understood that there are a large number of factors relating to the features of 
an object and that some, or all, of the above listed features can be used with 
combinations of these other factors in order to classify an object. What is 

25 important is that any feature factor selected facilitate the distinction between a 
human and a non-human class of objects. 
FOURIER DESCRIPTORS 

Fourier descriptors represent a set of features used to recognize a 
silhouette or contour of an object. As shown in Fig. 6C, the outline of an object 

30 is resampled into equally spaced points located about the edge of the object. 
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The Fourter descriptors are computed by treating these points as complex points 
and creating a point complex FFT (Fas, Fourier Transform) for the sequence 
The resulting coefficients are a taction of the position, size, orientation, and 
starting point P of the outline. Using these coefficients, Fourier descriptors are 
. extracted which are invariant to these variables. As a result of performing the 
feature extractions, what remains is a se, of features which now describe the 
segmented object. 

FEATURE SET NORMALIZATION 

The feature se, obtained as described above is now normalized For 
example, the se, of features may be resca!ed if ,he range of values for one of the 
features of the object is larger or smaller than the range which the res, of the 
features of the object have. Further, a tes, data base is established and when the 
feature data is tested on mis data base, a feature may be found to be skewed To 
ehmmate this skewing, a mathematical function such as a logarithmic function 
■s applied to the feature value. To further normalize the features, each feature 
value may be exercised through a linear taction; tha, is, for example a 
cons,an, value is added ,o ,he feature value, and ,he resul, is then multipHed by 
another cons,an, value. It win be understood that other consist, descrip,ors 
such as wavele, coefficient and fra«a, dimensions can be used instead of 
Fourier descriptors. 
OBJECT CLASSIFIER 

Having normalized a feature set, the set is now evaluated in order to 
classify the object which is represented by the set. Art object classifier portion 
of the processor means is provided as an input the normalized feature se, for the 
object ,„ be classified. The objec, classifier has already been provided feature 
se, mforma,io„ for humans as well as for a variety of animals (ca,, dog bird) 
such as shown in Fig, 7A - 7C. These Fig, show the presence of each amma, 
m an actual scene as viewed by ,he camera of me system. By evaluation me 
feature se, for ,he objec, with those for humans and animals, the classifier can 
determme a confidence value for each of three classes: human, animal and 
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unknown. Operation of the classifier includes implementation of a linear or 
non-linear classifier. A linear classifier may, for example implement a Bayes 
technique, as is well known in the art. A non-linear classifier may employ, for 
example, a neural net which is also well-known in the art, or its equivalent. 
5 Regardless of the object classifier used, operation of the classifier produces a 
"hard" decision as to whether the object is human, non-human, or unknown. 
Further, the method involves using the algorithm to look at a series of 
consecutive frames in which the object appears, perform the above described 
sequence of steps for each individual frame, and integrate the results of the 

10 separate classifications to further verify the result. 

Depending upon the outcome of the above analysis, the processing 
means, in response to the results of the object classification provides an 
indication of an intrusion if the object is classified as a human. It does not 
provide any indication if the object is classified as an animal. This prevents 

15 false alarms. It will be understood, that because an image of a scene provided 
by a camera C is evaluated on a continual basis, every one-half second for 
example, the fact that a human is now present in the scene but the result of the 
classification process may not identify him as such at one instant, does not mean 
that the intrusion will be missed. Rather, it only means that the human was not 

20 recognized as such at that instant. Because the movement of a human intruder 
into and through the scene involves motion of the person's head, trunk, and 
limbs, their position or posture will be recognized as those of a human, if not in 
one image of the scene, then probably in the next. And, anytime the presence of 
a human intrusion is continually recognized in accordance with the method of 

25 the invention, the alarm is given. Moreover, if the result of an object 
classification is unknown, an alarm indication is also given. However, the level 
of the alarm is less than that for a classified human intrusion. What this lower 
level alarm does is to alert security personnel that something has occurred which 
may require investigation. This is important because while the system is 

30 designed to not provide false alarms, it is also designed to not miss any human 
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mrustons e,the, Because of ,he manner in which the algorithm is conned 
- posstbthty an object wil, be Cassifted as unknown is very sm aU. As a 
resu,t, the ins.an.es in which . low leve| ^ ^ be ^ 
mfreouen, Thl , is . muoh ^ ^ ^ ^ 
there is an anomaly. 

An alarm, „ h e n i, is given , is ,„ a ^ ^ ^ 

Z m f 0 , nitOTin8 ' OCa,iC ' n S,affed ^ ^ ~' - d *» - 
number of locanons can be simultaneously monitored 

In view of the foregoing, i, wil, be seen tha, the several objects of me 
mvenfon are achieved and other advantageous results are obtained 

As various changes could be made in the above consuls without 
deparung ftom me scope of the invention, i, is intended tha, a!l matter conned 
•he above description or shown in the accompanying drawings shal, be 
mterpreted as illustrative and not in a limiting sense. 
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Claims 

1. A video security system visually monitoring a scene and detecting the 
presence of a human intruder within the scene comprising imaging means 
continually viewing the scene and producing a signal representative of the scene; 
processor means processing the signal, comparing the signal representing the 
scene at one point in time with a similar signal representing the scene at a 
previous point in time, and identifying those segments of the scene at said one 
point in time which differ from segments of the scene at the earlier point in time; 
and, discriminator means evaluating those segments of the scene identified as 
being different to classify each segment as a human life form or not, to give an 
alarm whenever an object present in one of the segments is classified as a human 
life form representing a human intruder within the scene, and to give no alarm if 
objects present in the segments are classified as non-human life forms. 

2. The video security system of claim 1 wherein said discriminator means 
includes means comparing pixel elements contained in each segment of the scene 
at one point in time and corresponding pixel elements contained in a 
corresponding segment from the scene at the earlier point in time, and producing 
an outline of each segment within the later scene which differs from a 
corresponding segment in the earlier scene. 

3. The video security system of claim 2 wherein said discriminator means 
further includes means growing each segment to a size which incorporates all of 
the pixels which define an object contained within the segment. 

4. The video security system of claim 3 wherein said discriminator means 
further includes means extracting a set of features from the object. 

5. The video security system of claim 4 wherein said feature extraction 
means includes means extracting linear shape features from the object as 
numerical values representing such factors as the height, width, horizontal and 
vertical edges of the object, and degree of circularity of the object. 

6. The video security system of claim 4 wherein said feature extraction 
means further includes means extracting Fourier descriptors of the silhouette 
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shape features of the object. 

7. The video security system of claim 6 wherein said feature extraction 
means further includes means normalizing any value obtained from the feature 
extraction means in the event said value falls outside a predetermined range of 
values for the particular feature. 

8. The video security system of claim 6 wherein said discriminator means 
further includes classifier means evaluating said set of features for said object with 
sets of features representing human and non-human life forms and for deriving a 
value representing a degree of confidence as to the correspondence of the object to 
a human or non-human life form. 

9. The video security system of claim 8 further including means providing an 
alarm indication only if the degree of confidence for the correspondence of the 
object to a human life form exceeds a predetermined confidence level. 

10. The video security system of claim 8 wherein said classifier means 
includes a linear object classification means providing a confidence level output 
for each of the three classes: human, animal, and unknown. 

11. The video security system of claim 8 wherein said classifier means 
includes a non-linear object classification means providing a confidence level 
output for each of three classes: human, animal, and unknown. 

12. The video security system of claim 1 wherein said discrimination means 
includes means executing an algorithm to perform object classification. 

13. The video security system of claim 4 wherein said feature extraction 
means includes means eliminating shadows cast by an object represented by the 
segment. 

14. The video security system of claim 9 wherein said alarm indication means 
further provides a second alarm indication if an object is classified as unknown. 

15. A video security system visually monitoring a scene and detecting motion 
of an object within the scene comprising imaging means continually viewing the 
scene and producing a signal representative of the scene; processor means 
processing said signal, comparing the signal representing the scene at one point in 
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time with a similar signal representing the scene at a previous point in time, and 
identifying those segments of the scene at said one point in time which differ from 
segments of the scene at the earlier point in time; and, discriminator means 
evaluating those segments of the scene identified as being different to determine if 
5 the differences are caused by surface differences which are indicative of the 
presence of an intruder within the scene, or lighting changes which occur within 
the scene and do not indicate the presence of an intruder, and if the difference is 
caused by the presence of an intruder providing an indication thereof, said 
discriminator means including means comparing pixel elements contained in each 

10 segment of the scene at the one point in time and corresponding pixel elements 
contained in a corresponding segment from the scene at the earlier point in time, 
and means determining a ratio of light intensity between each pixel in a segment 
with each pixel adjacent thereto, and means comparing the ratio values for the 
pixels in the segment of the scene at one point in time with the ratio values for the 

1 5 pixels in the corresponding segment of the scene at the earlier point in time. 

16. A method of evaluating a scene to determine if any perceived movement 
within the scene is caused by an intruder into the scene comprising viewing the 
scene and creating an image of the scene, said image of said scene comprising a 
plurality of pixels arranged in an array; comparing the image of the scene with a 

.20 reference image thereof to produce a difference image, producing said difference 
image including convolving the image with an antialiasing means to eliminate any 
aliasing effects in the resulting difference image, outlining any segments where a 
possible movement has occurred, determining a ratio of light intensity between 
each pixel in a segment with each pixel adjacent thereto, and comparing the ratio 

25 values for the pixels in a segment of one image with the ratio values for the pixels 
in the corresponding segment of other image; processing the difference image to 
identify any segments therewithin which, based upon a first predetermined set of 
criteria, represent spatially constrained movements of an object fixed within the 
scene, and further processing the difference image to identify any segments 

30 therewithin which, based upon a second predetermined set of criteria, represent 
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artifacts not caused by the presence of an intruder within the scene, said segments 
meeting said first and second sets of criteria being identified as segments not 
requiring further processing; and, further processing those segments within the 
difference image which remain to determine if movement there within is caused by 
an intruder. 
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